Partition refinement
Given a partition of {0,1,…,n-1} into disjoint sets \(C_1,\ldots,C_k\) and a set \(S\) modify the partition such that each set splits according to membership in S, i.e. produce the partition \(C_1\cap S, C_1\setminus S, \ldots, C_k\cap S, C_k\setminus S\). Find a data-structure representing partitions that allow this operation in time \(O(|S|)\) while preprocessing and space is \(O(n)\). See also wikipedia:partition refinement.
The general approach
Oh well, this post is about a data structure, which not among the simplest. What we want is to represent a partition as list of lists, every list representing a class of the partition.
When we refine \(C_1,\ldots,C_k\) by a set \(S\) (called pivot set), we need be careful to do it in time linear in the size of \(S\). We cannot loop over all sets \(C_j\) of the partition and replace them by \(C_j\cap S\) and \(C_j\setminus S\).
Instead we need to loop over elements of S. For each element \(i\in S\) we need to find the class C which contains i. Now we want to remove the item i from C, such that eventually it would become \(C\setminus S\) and insert it into a class that would eventually become \(C\cap S\). We call this class the split class of C.
So for every class C we need to to know if we created already such a split class, and if not, we need to create one when needed.
Hence we need a data structure that would keep track of
- a storage for all classes. We need to be able to loop over all classes.
- a structure for a class, that allows to loop over all elements and possibly link to a corresponding split class
- a structure for an element, that basically stores its value, and a link to its class
An implementation of refine would have the following structure.
- for every item i in the pivot set S
- let C be the class of i
- if there not yet a split class for C, then create one
- remove i from C and insert it into the split class of C
- for all classes C that have split
- remove the information that C has a corresponding split class
The last step ensures that during the next call to refine, the reminder of the class C, which is now \(C\setminus S\) can again split into a new class.
Double linked lists
The solution we propose allows to maintain an ordering on elements. This is called an ordered permutation refinement data structure. Elements of a class are ordered, and classes are ordered as well. When a class C splits into \(C\cap S\) and \(C\setminus S\), then C gets replaced by these two classes in that order.
For this purpose we will store classes and items in circular double linked lists.
With this data structure in mind we can define class and item objects, which are list items augmented with some attributes. An class has a pointer to the head of its item list, and a possible pointer to the corresponding split class. An item has a pointer to its class, and stores its value.
The data structure
Application
This datastructure finds is application in the twin problem, in the lexBFS traversal of a graph, and on the consecutive ones problem, to name a few.
Links
- An implementation by David Eppstein. It is much more readable, uses dictionaries and does not preserve an ordering on elements and classes.